- Focus on Japanese restaurant in three main cities - New York, Dallas and San Fransisco.
- Data from Yelp API.
12/16/2019
- Focus on Japanese restaurant in three main cities - New York, Dallas and San Fransisco.
- Data from Yelp API.
This project aimed to explore Japanese restaurants’ ratings and reviews in there most popular cities in US. Which are New York, San Fransisco and Dallas.
First we will explor the rating distributions of restaurant in different cities.
Then we will dive in their geographical location distritutions to see the clustering on a geographical scale. (Pls refer to Shiny App)
Finally we will explore on the review data to find the frequency of popular words in the reviews of high-reviewing restaurants and also sentiments words clouds to see their positive and negative words distributions.
key <- textreadr::read_rtf(“API.rtf”)
Location: New York City, San Francisco, Dallas
Categories : Restaurants
Term: Japanese
** Data reading please refer read_data.R file
-Summary of the Restaurant Data and Average Review Ratings by City.
| city | total_business | total_reviews | avg_rating |
|---|---|---|---|
| NYC | 999 | 331790 | 4.02 |
| Dallas | 569 | 127633 | 3.84 |
| SF | 922 | 493271 | 3.78 |
Ratings Score Distribution
Ratings Score Distribution By City
Ratings Score Distribution By Price Level
We can see that most of the ratings are concenrated on 4 scores, Average Japanese restaurant ratings in New York is higher then the other two cities.
Frequent words in New York Japanese Restaurant
Frequent words in San Fransisco Japanese Restaurant
Frequent words in Dallas Japanese Restaurant
## ## Pearson's product-moment correlation ## ## data: proportion and Dallas ## t = 49.969, df = 674, p-value < 2.2e-16 ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## 0.8702022 0.9024031 ## sample estimates: ## cor ## 0.88738
## ## Pearson's product-moment correlation ## ## data: proportion and Dallas ## t = 70.971, df = 872, p-value < 2.2e-16 ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## 0.9128449 0.9324908 ## sample estimates: ## cor ## 0.9232693